Domain-Independent Structured Duplicate Detection
نویسندگان
چکیده
The scalability of graph-search algorithms can be greatly extended by using external memory, such as disk, to store generated nodes. We consider structured duplicate detection, an approach to external-memory graph search that limits the number of slow disk I/O operations needed to access search nodes stored on disk by using an abstract representation of the graph to localize memory references. For graphs with sufficient locality, structured duplicate detection outperforms other approaches to external-memory graph search. We develop an automatic method for creating an abstract representation that reveals the local structure of a graph. We then integrate this approach into a domain-independent STRIPS planner and show that it dramatically improves scalability for a wide range of planning problems. The success of this approach strongly suggests that similar local structure can be found in many other graph-search problems.
منابع مشابه
Parallel Structured Duplicate Detection
We describe a novel approach to parallelizing graph search using structured duplicate detection. Structured duplicate detection was originally developed as an approach to externalmemory graph search that reduces the number of expensive disk I/O operations needed to check stored nodes for duplicates, by using an abstraction of the search graph to localize memory references. In this paper, we sho...
متن کاملA Domain-Independent Data Cleaning Algorithm for Detecting Similar-Duplicates
Data mining algorithms generally assume that data will be clean and consistent. However, in practice, this is not always the case, and for this reason the detection and elimination of duplicate records is an important part of data cleaning. The presence of similar-duplicate records causes over-representation of data. If the database contains different representations of the same data, the resul...
متن کاملFuzzy Duplicate Detection on XML Data
XML is popular for data exchange and data publishing on the Web, but it comes with errors and inconsistencies inherent to real-world data. Hence, there is a need for XML data cleansing, which requires solutions for fuzzy duplicate detection in XML. The hierarchical and semi-structured nature of XML strongly differs from the flat and structured relational model, which has received the main atten...
متن کاملDetection of Duplicate Objects in Semi Structured Data like XML
Duplicate detection is the process of finding the duplicate objects in the data. This is the important part of data cleansing step of data mining. Duplication occurs when some real world object has multiple representations in data source. Significant amount of work has been done in duplicate detection of relational data, but only recently the researchers have shifted their focus towards duplica...
متن کاملUnsupervised duplicate detection using sample non-duplicates
The problem of identifying objects in databases that refer to the same real world entity, is known, among others, as duplicate detection or record linkage. Objects may be duplicates, even though they are not identical due to errors and missing data. Traditional scenarios for duplicate detection are data warehouses, which are populated from several data sources. Duplicate detection here is part ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006